Kenneth Tay
Oct 17, 2017
ggplot2
dplyr
selectmutatearrangefiltersummarizegroup_byThe most important syntax in R is the function call. All R syntax has function calls underlying it.
function_name(<inputs to the function>,
<arguments which change
how the function operates>)x <- c(-5, -3, -1, 1, 3, NA)
mean(x)## [1] NA
mean(x, na.rm = TRUE)## [1] -1
abs(x): If x is positive, return x. If x is negative, return x without the negative sign.
mean(abs(x), na.rm = TRUE)## [1] 2.6
abs(x): If x is positive, return x. If x is negative, return x without the negative sign.
mean(abs(x), na.rm = TRUE)## [1] 2.6
dplyr syntax without %>%Take the mtcars dataset, select just the wt and mpg columns:
library(dplyr)
data(mtcars)
select(mtcars, wt, mpg)dplyr syntax without %>%Take the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15
Evaluating from inside-out, but code reads outside-in.
filter(select(mtcars, wt, mpg), mpg < 15)dplyr syntax without %>%Take the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15
Evaluating from inside-out, but code reads outside-in.
filter(select(mtcars, wt, mpg), mpg < 15)dplyr syntax with %>%Take the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15
mtcars %>% select(wt, mpg) %>% filter(mpg < 15)dplyr syntax with %>%mtcars %>%
select(wt, mpg) %>%
filter(mpg < 15)Moral: dplyr can be used without %>%, but %>% makes code much more intuitive.
ggplot2 syntax+ operator is meant to mimic how the data analyst thinks when making a plot: adding things to the plot one at a time.
library(ggplot2)
ggplot()ggplot2 syntaxggplot() +
geom_point(data = mtcars, mapping = aes(x = wt, y = hp))ggplot2 syntaxggplot() +
geom_point(data = mtcars, mapping = aes(x = wt, y = hp)) +
labs(title = "Horsepower vs. Weight", x = "Weight",
y = "Horsepower")ggplot2 syntaxggplot() +
geom_point(data = mtcars, mapping = aes(x = wt, y = hp)) +
labs(title = "Horsepower vs. Weight", x = "Weight",
y = "Horsepower") +
theme_classic()ggplot2 syntaxGeometries need data and mappings. If not in the parentheses behind it, it looks for them in the ggplot() call.
ggplot(data = mtcars, mapping = aes(x = wt, y = hp)) +
geom_point() +
labs(title = "Horsepower vs. Weight", x = "Weight",
y = "Horsepower") +
theme_classic()readr and tidyr“Official” cheat sheet for readr and tidyr available here.
dplyr), you have to reload themThere’s a water section! Looks promising…
Drought statistics! But what’s with that description… Let’s click on it anyway…
3 things stand out:
Always a good idea to preview before downloading. Let’s click on the “Preview”…
ReleaseDate, ValidStart & ValidEnd??None & Stat..??None represents?After a couple of clicks…
Looks like what we want!
I’ve saved you the trouble by going through all these steps: you can download the csv file from Canvas (under Files, in the Session 5 folder).
Different packages for working with different data formats
readr)readxl)haven)DBI)jsonlite)xml2)httr)rvest)tidyr verbs: gather and spreadgather: Used when some column names are not variables, but values of a variable
spread: Opposite of gather
tidyr verbs: separate and uniteseparate: Used to separate values in one column into multiple columns
unite: Opposite of separate